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Generalized  Rank  Annihilation  Factor  Analysis 


SttrThe  analytical  chemist  is  frequently  confronted  with  the 
problem  of  analyzing  complex  mixtures  for  which  only  concentrations  of  a 
few  components  are  of  interest.  In  these  cases,  it  is  desirable  to  be  able  to 
obtain  quantitative  information  for  the  analytes  of  interest  without  concern 
for  the  rest  of  the  components  in  the  sample.  Second  order  bilinear  sensors, 
i.e.  sensors  that  yield  a  two  dimensional  data  matrix  of  the  form 
MjjrSkBkXfcyjk ,  are  specially  suited  for  this  purpose,  and  the  preferred 

technique  for  quantitation  is  known  as  rank  annihilation  factor  analysis, 
RAFA  (1,2).  So  far  this  method  has  been  applied  to  excitation-emission 
fluorescence  (1-3),  LC/UV  (4)  andTLC-reflect8nce  imaging  spectrophoto¬ 
metry  (5)  with  good  results.  It  is  important  to  realize  that  not  all  two 
dimensional  techniques  yield  bilinear  data  arrays:  e.g.  2D-NMR  or  MS/MS 
data  in  their  raw  forms  are  not  bilinear. 


A  limitation  of  rank  annihilation  as  originally  formulated  is  that  an 
iterative  solution  requiring  many  matrix  diagonalizations  is  necessary  (1). 
Lorber  (6)  has  reported  a  non-iterative  solution  presenting  the  problem  as  a 
generalized  eigenvalue-eigenvector  equation  for  which  a  direct  solution  is 
found  by  using  the  singular  value  decomposition.  With  his  method,  to  obtain 
the  concentrations  of  the  p  analytes  of  interest  in  the  sample,  its  bilinear 
spectrum  and  the  p  calibration  spectra  for  each  pure  analyte  must  be 
recorded  to  obtain  the  concentrations  .  Analysis  for  each  8nalyte  requires  a 
separate  calculation.  This  letter  presents  the  generalized  rank  annihilation 
method,  of  which  Lorber's  non-iterative  method  is  only  a  particular  case. 


L 


one  bilinear  calibration  spectrum  obtained  from  a  mixture  of  standards,  one 
standard  for  each  analyte. 

Generalized  rank  annihilation  can  determine  the  bilinear  spectrum 
and  the  relative  concentration  for  each  analyte  in  the  unknown  mixture.  The 
calculated  spectra  are  next  matched  to  those  of  the  standards.  It  Is  then 
straightforward  to  determine  the  actual  concentration  of  each  analyte  from 
Its  relative  concentration  and  the  concentration  of  the  corresponding 
standard.  The  full  bilinear  spectrum  of  each  analyte  Is  not  actually  required 
for  Identification.  One  need  only  use  a  single  order  (e.g.  only  the  UV 
spectrum  in  the  LC/UV  case)  for  the  match.  This  is  an  unusual  type  of 
analysis  as  in  most  cases,  analyte  concentrations  are  estimated  one  at  a 
time  thereby  precluding  identification. 

THEORY  AND  DISCUSSION 

Any  bilinear  data  matrix  M  can  be  expressed  as  a  linear  combination 
of  the  n  pure-component,  bilinear  spectra  Mk: 

n 

M  =  2^  £|<Mk  where  Mk  -  yjT  ;  -  x^  y^  ( 1 ) 

The  xk  are  column  vectors  with  information  in  one  order,  e.g.  excitation 
spectra,  and  the  ykT  are  row  vectors  with  information  in  the  second  order, 
e.g.  emission  spectra.  If  we  define  the  Mk  as  unitaru-concentration.  pure- 
component,  bilinear  spectra,  is  the  concentration  of  the  k^  compound  in 


M.  We  can  rewrite  eq  1  in  matrix  notation  as 
M  =  X  B  YT 


(2) 


where  X  is  a  matrix  whose  columns  are  the  n  vectors,  YT  is  a  matrix 
whose  rows  are  the  n  gkT  vectors  and  B  is  a  diagonal  matrix  with  diagonal 
elements  that  are  the  concentrations,  Bk. 

In  general  we  will  have  two  data  matrices,  the  unknown 
concentrations  data  matrix  M  and  the  calibration  data  matrix  N.  The  bilinear 
calibration  data  matrix  N  can  similarly  be  represented  in  matrix  notation  as 

N  =  X  l  YT  (3) 

where  X  and  YT  are  the  same  matrices  defined  for  eq  2  and  £  is  a  diagonal 
matrix  whose  diagonal  elements  are  the  concentrations  £k  for  the 

calibration  matrix. 

The  matrices  M  and  N  have  in  common  the  X  and  YT  blocks,  e.g.  the 
excitation  and  emission  spectra  are  the  same,  differing  only  in  their 
concentration  matrices,  B  and  %  respectively.  Therefore,  solving  for  X  in  eq 


2  and  eq  3  we  obtain 

X  B  =  M  (YT>+  (4) 

X  *  =  N  (YT)+  (5) 

where  (YT)+  represents  the  pseudoinverse  (7)  of  the  matrix  YT.  Now  we 
right-rnul tiply  eq  4  by  £  and  eq  5  by  B  and  combine  to  get: 

N  (YT)+B  =  M  (YT)n  (6) 

defining  Z  =  (YT)+, 

N  Z  B  a  MZt  (7) 


We  only  know  M,  N  and  thus  we  must  to  solve  for  Z  and  B.  Eq  (7)  is  similar 
to  the  generalized  eigenvalue-eigenvector  problem,  but  can  not  be  solved  by 
standard  methods  since  N  and  M  are  not  necessarily  square  matrices.  A 
solution  of  this  equation  will  be  discussed  in  the  next  sections,  for  the 
following  different  possible  cases: 
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111  The  calibration  data  matrix  N  has  just  ^component,  that  is 
present  In  the  sample  date  matrix  h, 

diagona1(8)=  ... ,  8n}  n>l  (8) 

diagonalU)=  tti,  0  , ... ,  0 }  (9) 

This  Is  the  standard  RAFA  problem  as  discussed  by  Lorber  (6). 

121  The  calibration  data  matrix  N  has  ^w/'j/components,  that  are  a 
subset  of  the  components  present  in  the  sample  data  matrix  M, 

diagona1(B)=  (8, ,1%, ... ,  8r,8r+t ,8r+2, ... ,  JW  r  >  0  (10) 

diagonal^)5  (0,  0,  ...,0,  Ui U*>  s  2  1  (11) 

Here,  r  is  the  number  of  components  in  the  sample  M  that  are  not  present  in 
the  calibration  N,  and  s  Is  the  number  of  common  components. 

131  The  components  in  the  sample  data  matrix  M  are  a  subset  of  the 
components  present  in  the  calibration  data  matrix  N, 

diagonal(8)=  {0, ,82, ... ,  0  , ... ,  0  }  s  >  1  (12) 

diagonal^  Ui,  \2>  -  ,  U,  Ut . W  • >  1  (13) 

Again,  s  is  the  number  of  common  components,  and  t  is  the  number  of 
components  in  the  calibration  N  that  are  absent  from  the  sample  M. 

141  The  most  general  case  would  be  when  there  are  analytes  in  the 
unknown  sample  that  are  not  present  in  the  calibration  sample  and  vice 
versa, 

diagonal(B)=  {8,  ,1^ . 8r,8r+1 ,0r+2, ... ,  8r+„  0  , ,  0  }  (14) 

diagonal(0=  (0,  0,  ... ,  0,  2>  "•  •  £r+j/  (15) 

Here,  r  is  the  number  of  components  In  the  sample  M  that  are  not  present  in 
the  calibration  N,  s  is  the  number  of  common  components,  and  t  is  the 
number  of  components  In  the  calibration  data  matrix  N  that  are  absent  in 
the  unknown  sample  M. 


Ml  FIRST  CASE:  One  Component  Quantitation 

In  this  case,  the  calibration  data  matrix  N  has  just  one  component, 
Mb  that  Is  also  present  In  the  sample  data  matrix.  The  solution  for  this 

case  has  been  reported  by  Lorber  (6)  and  will  be  included  here  for 
completeness. 

The  first  step  in  solving  eq  7  is  to  apply  principal  components 
analysis  (8)  to  the  sample  matrix  M,  and  then  express  the  matrices  in  terms 
of  these  principal  components.  The  principal  components  of  M  are  obtained 


by  applying  singular  value  decomposition  (7) 

M  =  USVT  (16) 

•where 

M  V  =  S  U  (17) 

MTU  =  S  V  (18) 

MTMV  =  S2V  eigen-equations  in  V  space.  (19) 

MMTU  =  s2ll  eigen-equations  in  u  space.  (20) 


The  next  step  is  to  estimate  the  number  of  principal  components 
that  are  significant  using  abstract  factor  analysis  (8)  or  cross  validation 
(9,10).  In  the  ideal  case,  this  number  is  equal  to  the  number  of  components  n 
in  the  sample  mixture.  The  number  of  significant  principal  components  will 
allow  reduction  to  the  deterministic  information  contained  in  the  M  matrix, 
with  random  error  discarded  in  the  lesser  factors.  To  do  this,  a  new  matrix 
G  Is  generated  from  the  first  n  "significant”  columns  of  U,  V  and  the  upper 
left  corner  n  by  n  part  of  S, 

G  =  U  £  V T  (21) 

Now  eq  7  can  be  rewritten  as 

N  Z  C  =  MZ  t  =  u  s  VT  z  t 


(22) 


If  we  substitute  Z  =  y  £_1  Z\  where  V  ■  £yT  Z 

N(y£-Jz*)B=  lifiynyfi-1  z*)i  (23) 

using  the  orthogonality  properties  of  y,  VTV  =  i  =  Identity  matrix  in  the 
upper  left  n  by  n  corner  arid  zeros  In  the  rest,  so: 

Ny£-1  Z*0  =  11  £  1  £_1  Z*t=U(SS‘1)ZU  (248) 

which  reduces  to 

(Nyfi-MZ^silZ*!  (24b) 

Left-multiplying  by  MT  and  right-multiplying  by  gives 

(UT  NV  £-0  Z*BB-i  =  (ifU)  Z*  fcB"1  =  Z*  X  X  m  £0-*  (25a) 

or,  finally, 

(UT  N  y  fi-OZ’s  Z*X  (25b) 

which  is  the  usual  eigenvalue-eigenvector  equation,  because  the  matrix 
(UT  Nky  £_t)  is  square.  The  eigenvectors  Z*  are  not  perpendicular  because 
the  matrix  (UT  Nky  £_l)  is  not  symmetric.  Because  the  rank  of  N  Is  one, 
there  will  be  p-1  zero  solutions  for  the  eigenvalues  Xk.  Therefore,  the  only 
non-zero  solution  will  be  equal  to  the  trace  of  the  matrix  (UT  Nky  £_l).  By 
calculating  the  trace  of  this  matrix,  i.e.  Ak,  the  concentration  ^  of  the  kth 
component  Is  solved  directly  as  =  tk/Ak. 

If  the  unknown  sample  does  not  have  the  component  that  is  present 
in  the  calibration  sample,  we  cannot  expand  N  In  terms  of  X  and  Y  (eq  3), 
therefore  eq  25b  Is  not  valid.  This  Is  an  example  of  the  fourth  case 
introduced  in  the  previous  section,  which  will  be  considered  later  in  this 
paper.  In  practice,  a  non-zero  concentration  value  will  be  obtained,  so 

the  validity  of  eq  3  must  be  verified  before  applying  eq  25b.  Using  target 
factor  analysis  (8,1 1),  modified  for  bilinear  data,  it  is  possible  to  check  if 
N  is  included  in  M  (see  appendix  for  the  details  of  bilinear  target  factor 
analysis).  The  projection  matrices  U  UT  and  V  VT  should  leave  N  unchanged: 
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UiJTNVyT  =  N  (26) 

As  pointed  out  by  Lorber  (6),  If  the  calibration  matrix  N  has  more 
than  one  component,  l.e.  Its  rank  Is  greater  than  one,  several  solutions  will 
be  obtained  for  the  concentrations  A,  but  there  will  be  no  way  to 

match  which  concentration  corresponds  to  which  chemical  component.  The 
proposed  alternative  Is  to  obtain  the  spectrum  of  all  the  components 
separately,  and  estimate  their  concentration  one  by  one.  A  solution  to  this 
problem  is  described  in  the  next  section,  using  the  eigenvectors  matrix  Z  in 
eq  7,  which  was  defined  as  the  pseudoinverse  of  the  YT,  i.e.  the  generalized 
inverse  of  the  pure  component's  emission  spectra. 

[21  SECOND  CASE:  Simultaneous  Quantitation  of 
Several  Components 

in  this  case,  the  calibration  data  matrix  N  has  several  components, 
that  are  a  subset  of  the  components  present  in  the  sample  data  matrix  M. 

In  the  first  place,  it  is  necessary  to  check  that  the  components  in  N  ere  a 
subset  of  the  components  present  in  the  sample  data  matrix  M,  applying 
bilinear  target  factor  analysis  to  the  matrix  N,  i.e.  eq  26  should  be  true. 

If  more  than  one  component  is  represented  in  the  calibration 
matrix,  eq  25b  has  several  non-zero  eigenvalues.  The  solution  is  a  set  of 
eigenvalues  X  and  their  corresponding  eigenvectors  1*.  The  eigenvectors 
allow  us  to  calculate  the  pure  spectra  matrices  X  and  YT,  e.g.  excitation  and 
emission  spectra: 

z*  =  svtz  =  svt(yt)* 

yt  =  (V  s-i  z*)+ 

Using  the  definition  of  M  =  X  B  YT  =  U  S  VT 


(27) 

(26) 
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XB  =  M  (YT)+  =  il  S  y  S‘i  Z*  =  yZ#  (29) 

The  eigenvalues  Ak  are  the  ratio  of  concentrations  for  each 
component,  i.e.  celibration/unknown.  Having  the  pure  spectra  xk  orykT,  it  is 
easy  to  match  which  concentration  corresponds  to  which  ratio  Xk, 
therefore  the  concentrations  B*  can  be  estimated  Bk  =  tk/Xk. 

[31  THIRD  CASE:  Calibration  as  a  Base 

When  the  sample  data  matrix  M  is  a  subset  of  the  components  in  the 
calibration  N,  we  must  invert  the  procedure.  The  principal  components  of 
the  matrix  M  do  not  form  a  basis  for  the  representation  of  the  matrix  N, 
therefore  eq  25b  is  not  valid  in  this  case.  The  principal  components  of  N  are 
estimated  N  =  ifo  Ynt,  and  equations  similar  to  eq  25b,  28,  29  are 

obtained: 

<UntM&  5n‘')  ZH* -- 2h*  An  (30) 

YT^YnSn-1  ZH*y*  (31) 

XB  =  Ifr,  Zn#  (32) 

The  eigenvalues  (X^  ere  not  defined  as  they  were  before.  Now  the 
(XH)k  ore  the  ratio  of  concentrations  iVtk  [or  each  component,  I.e.  unknown 
sample/collbration. 

Bilinear  target  factor  analysis  can  be  used  to  test  instances  of  the 
third  case.  The  projection  of  the  matrix  M  in  the  spaces  defined  by  N  should 
leave  M  unchanged: 

Uw  Unt  n  Mh  ¥nt  =  M  (33) 

If  both  this  test  and  eq  26  fail,  then  we  are  dealing  with  the  fourth  case, 
discussed  in  the  next  section.  In  practice,  the  third  case  can  be  solved  using 


principal  components  analysis  or  multiple  linear  regression,  because  the 
spectra  of  all  the  components  are  known. 

141  FOURTH  CASE:  The  General  Condition 

In  this  case,  the  calibration  sample  will  have  some  components  that 
are  not  present  In  the  unknown  sample,  and  there  will  be  some  components 
in  this  unknown  sample  not  present  In  the  calibration  sample.  Projection  of 
one  matrix  onto  the  principal  components  of  the  other  matrix  will  change 
Its  Information;  eq  25b  and  eq  30  will  not  be  valid. 

A  solution  to  this  problem  can  be  obtained  using  the  principal 
components  of  the  sum  of  the  matrices  M  and  N,  defining  W  ■  M  +  N, 


W  =  Uv  Sw  YvT 

(34) 

(UvTMYv  Sv-')ZV‘  =  V*v 

(35) 

YT  =  (Vy  V'  Zy*)* 

(36) 

XO  =  Uw  Zy* 

(37) 

The  eigenvalues  Xk  are  the  ratio  of  concentrations 

V(tk*l3k).  For  all 

the  components  present  In  both  mixtures,  the  concentration  in  the  unknown 
is  Dk=Xktk/(1-?vk).  When  one  component  is  not  present  in  the  calibration 
sample,  £k  =  0,  and  \  =  I. 

The  solution  presented  for  this  case  can  be  applied  to  all  the 
previous  cases,  and  no  testing  with  target  factor  analysis  is  necessary.  An 
artificial  matrix  W  Is  generated  to  perform  the  calculations.  This  suggests 
that  one  could  Instead  generate  the  W  matrix  simply  by  making  a  single 
standard  addition  containing  known  amounts  of  all  analytes  to  the  unknown 
sample.  In  this  way  the  calibration  mixture  is  added  to  the  unknown 
mixture,  and  the  W  matrix  is  measured  directly.  Quantitation  by  RAFA  with 


the  standard  addition  method  (SAM)  has  been  discussed  by  Lorber  ( 1 4)  for 
single  analyte  addition.  This  procedure  would  extend  the  applicability  of  his 
method  to  the  quantitation  of  several  analytes  at  a  time,  correcting  for 
matrix  effects  and  thereby  represents  an  extension  of  the  generalized 
standard  addition  method,  GSAM  ( 1 2, 1 3),  to  second-order  tensor  data. 

If  we  have  several  calibration  matrices  Nt,  N2, Nq,  we  can  apply 
the  method  to  all  of  them,  one  at  a  time,  or  we  can  handle  It  as  a  three  way 
factor  analysis  problem,  using  all  of  the  information  in  one  calculation.  We 
are  currently  working  In  this  problem,  which  will  be  the  subject  of  another 
publication. 


APPENDIX 


Target  Factor  Analysis  (8, 1 1 )  can  be  applied  to  bilinear  data  in  a 
similar  way  that  It  Is  used  to  one  dimensional  data.  For  the  test  vectors  x{ 
or  yjt  IF  A  can  be  expressed  as: 

UUTXj-jr,  or  yiTYYT“/tT  (38) 

every  test  vector  Xj  or  y}  generates  a  predicted  targent  vector  xx  or  yv  If 

the  test  vectors  are  present  In  the  matrix  M,  l.e.  if  the  ilh-component,  which 
spectrum  is  x^i,  is  present  in  M  ,  then  the  predicted  targent  vectors  should 


be  equal  to  the  test  vectors:  x{  *  jrj ;  ys  =  yK ;  therefore 

UUTXj»Xj  or  yjTYYT  =  yjT  (39) 

using  the  definition  of  X  and  Y  we  can  similarly  write 

MiiTX  =  X  YT  YYt  *  YT  (40) 

now,  if  N  *  X  t  YT,  then 

UUTNYYT  =  (UiiTX)  l  (YTYYT)  =  X  t  YT  =  N  (41) 

this  is, 

UUtNYYt  =  N.  (42) 

this  equation  defines  bilinear  target  factor  analysis.  Note  that 

UUTN  a  N.  and  NYYTsN.  (43) 


in  practice,  due  to  random  noise,  equations  42-44  are  aproximate. 
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